To improve uncertainty quantification of variance networks, we propose a novel tree-structured local neural network model that partitions the feature space into multiple regions based on uncertainty heterogeneity. A tree is built upon giving the training data, whose leaf nodes represent different regions where region-specific neural networks are trained to predict both the mean and the variance for quantifying uncertainty. The proposed Uncertainty-Splitting Neural Regression Tree (USNRT) employs novel splitting criteria. At each node, a neural network is trained on the full data first, and a statistical test for the residuals is conducted to find the best split, corresponding to the two sub-regions with the most significant uncertainty heterogeneity. USNRT is computationally friendly because very few leaf nodes are sufficient and pruning is unnecessary. On extensive UCI datasets, in terms of both calibration and sharpness, USNRT shows superior performance compared to some recent popular methods for variance prediction, including vanilla variance network, deep ensemble, dropout-based methods, tree-based models, etc. Through comprehensive visualization and analysis, we uncover how USNRT works and show its merits.
translated by 谷歌翻译
Conversational recommender systems (CRSs) often utilize external knowledge graphs (KGs) to introduce rich semantic information and recommend relevant items through natural language dialogues. However, original KGs employed in existing CRSs are often incomplete and sparse, which limits the reasoning capability in recommendation. Moreover, only few of existing studies exploit the dialogue context to dynamically refine knowledge from KGs for better recommendation. To address the above issues, we propose the Variational Reasoning over Incomplete KGs Conversational Recommender (VRICR). Our key idea is to incorporate the large dialogue corpus naturally accompanied with CRSs to enhance the incomplete KGs; and perform dynamic knowledge reasoning conditioned on the dialogue context. Specifically, we denote the dialogue-specific subgraphs of KGs as latent variables with categorical priors for adaptive knowledge graphs refactor. We propose a variational Bayesian method to approximate posterior distributions over dialogue-specific subgraphs, which not only leverages the dialogue corpus for restructuring missing entity relations but also dynamically selects knowledge based on the dialogue context. Finally, we infuse the dialogue-specific subgraphs to decode the recommendation and responses. We conduct experiments on two benchmark CRSs datasets. Experimental results confirm the effectiveness of our proposed method.
translated by 谷歌翻译
Relation Extraction (RE) has been extended to cross-document scenarios because many relations are not simply described in a single document. This inevitably brings the challenge of efficient open-space evidence retrieval to support the inference of cross-document relations, along with the challenge of multi-hop reasoning on top of entities and evidence scattered in an open set of documents. To combat these challenges, we propose Mr.CoD, a multi-hop evidence retrieval method based on evidence path mining and ranking with adapted dense retrievers. We explore multiple variants of retrievers to show evidence retrieval is an essential part in cross-document RE. Experiments on CodRED show that evidence retrieval with Mr.Cod effectively acquires cross-document evidence that essentially supports open-setting cross-document RE. Additionally, we show that Mr.CoD facilitates evidence retrieval and boosts end-to-end RE performance with effective multi-hop reasoning in both closed and open settings of RE.
translated by 谷歌翻译
关于无监督的域适应性(UDA)的广泛研究已将有限的实验数据集深入学习到现实世界中无约束的领域。大多数UDA接近通用嵌入空间中的对齐功能,并将共享分类器应用于目标预测。但是,由于当域差异很大时可能不存在完全排列的特征空间,因此这些方法受到了两个局限性。首先,由于缺乏目标标签监督,强制域的比对会恶化目标域的可区分性。其次,源监督分类器不可避免地偏向源数据,因此它在目标域中的表现可能不佳。为了减轻这些问题,我们建议在两个集中在不同领域的空间中同时进行特征对齐,并为每个空间创建一个针对该域的面向域的分类器。具体而言,我们设计了一个面向域的变压器(DOT),该变压器(DOT)具有两个单独的分类令牌,以学习不同的面向域的表示形式和两个分类器,以保持域的可区分性。理论保证的基于对比度的对齐和源指导的伪标签细化策略被用来探索域名和特定信息。全面的实验验证了我们的方法在几个基准上实现了最先进的方法。
translated by 谷歌翻译
无监督的域适应(UDA)旨在将知识从标记的源域传输到未标记的目标域。大多数现有的UDA方法通过学习域 - 不变的表示和在两个域中共享一个分类器来实现知识传输。但是,忽略与任务相关的域特定信息,并强制统一的分类器以适合两个域将限制每个域中的特征表达性。在本文中,通过观察到具有可比参数的变压器架构可以产生比CNN对应的更可转换的表示,我们提出了一个双赢的变压器框架(WINTR),它分别探讨了每个域的特定于域的知识,而同时交互式跨域知识。具体而言,我们使用变压器中的两个单独的分类令牌学习两个不同的映射,以及每个特定于域的分类器的设计。跨域知识通过源引导标签改进和与源或目标的单侧特征对齐传输,这保持了特定于域的信息的完整性。三个基准数据集的广泛实验表明,我们的方法优于最先进的UDA方法,验证利用域特定和不变性的有效性
translated by 谷歌翻译
Benefiting from the intrinsic supervision information exploitation capability, contrastive learning has achieved promising performance in the field of deep graph clustering recently. However, we observe that two drawbacks of the positive and negative sample construction mechanisms limit the performance of existing algorithms from further improvement. 1) The quality of positive samples heavily depends on the carefully designed data augmentations, while inappropriate data augmentations would easily lead to the semantic drift and indiscriminative positive samples. 2) The constructed negative samples are not reliable for ignoring important clustering information. To solve these problems, we propose a Cluster-guided Contrastive deep Graph Clustering network (CCGC) by mining the intrinsic supervision information in the high-confidence clustering results. Specifically, instead of conducting complex node or edge perturbation, we construct two views of the graph by designing special Siamese encoders whose weights are not shared between the sibling sub-networks. Then, guided by the high-confidence clustering information, we carefully select and construct the positive samples from the same high-confidence cluster in two views. Moreover, to construct semantic meaningful negative sample pairs, we regard the centers of different high-confidence clusters as negative samples, thus improving the discriminative capability and reliability of the constructed sample pairs. Lastly, we design an objective function to pull close the samples from the same cluster while pushing away those from other clusters by maximizing and minimizing the cross-view cosine similarity between positive and negative samples. Extensive experimental results on six datasets demonstrate the effectiveness of CCGC compared with the existing state-of-the-art algorithms.
translated by 谷歌翻译
Relation extraction (RE), which has relied on structurally annotated corpora for model training, has been particularly challenging in low-resource scenarios and domains. Recent literature has tackled low-resource RE by self-supervised learning, where the solution involves pretraining the relation embedding by RE-based objective and finetuning on labeled data by classification-based objective. However, a critical challenge to this approach is the gap in objectives, which prevents the RE model from fully utilizing the knowledge in pretrained representations. In this paper, we aim at bridging the gap and propose to pretrain and finetune the RE model using consistent objectives of contrastive learning. Since in this kind of representation learning paradigm, one relation may easily form multiple clusters in the representation space, we further propose a multi-center contrastive loss that allows one relation to form multiple clusters to better align with pretraining. Experiments on two document-level RE datasets, BioRED and Re-DocRED, demonstrate the effectiveness of our method. Particularly, when using 1% end-task training data, our method outperforms PLM-based RE classifier by 10.5% and 5.8% on the two datasets, respectively.
translated by 谷歌翻译
Data Augmentation (DA) is frequently used to automatically provide additional training data without extra human annotation. However, data augmentation may introduce noisy data that impairs training. To guarantee the quality of augmented data, existing methods either assume no noise exists in the augmented data and adopt consistency training or use simple heuristics such as training loss and diversity constraints to filter out ``noisy'' data. However, those filtered examples may still contain useful information, and dropping them completely causes loss of supervision signals. In this paper, based on the assumption that the original dataset is cleaner than the augmented data, we propose an on-the-fly denoising technique for data augmentation that learns from soft augmented labels provided by an organic teacher model trained on the cleaner original data. A simple self-regularization module is applied to force the model prediction to be consistent across two distinct dropouts to further prevent overfitting on noisy labels. Our method can be applied to augmentation techniques in general and can consistently improve the performance on both text classification and question-answering tasks.
translated by 谷歌翻译
Contrastive deep graph clustering, which aims to divide nodes into disjoint groups via contrastive mechanisms, is a challenging research spot. Among the recent works, hard sample mining-based algorithms have achieved great attention for their promising performance. However, we find that the existing hard sample mining methods have two problems as follows. 1) In the hardness measurement, the important structural information is overlooked for similarity calculation, degrading the representativeness of the selected hard negative samples. 2) Previous works merely focus on the hard negative sample pairs while neglecting the hard positive sample pairs. Nevertheless, samples within the same cluster but with low similarity should also be carefully learned. To solve the problems, we propose a novel contrastive deep graph clustering method dubbed Hard Sample Aware Network (HSAN) by introducing a comprehensive similarity measure criterion and a general dynamic sample weighing strategy. Concretely, in our algorithm, the similarities between samples are calculated by considering both the attribute embeddings and the structure embeddings, better revealing sample relationships and assisting hardness measurement. Moreover, under the guidance of the carefully collected high-confidence clustering information, our proposed weight modulating function will first recognize the positive and negative samples and then dynamically up-weight the hard sample pairs while down-weighting the easy ones. In this way, our method can mine not only the hard negative samples but also the hard positive sample, thus improving the discriminative capability of the samples further. Extensive experiments and analyses demonstrate the superiority and effectiveness of our proposed method.
translated by 谷歌翻译
Knowledge graph reasoning (KGR), aiming to deduce new facts from existing facts based on mined logic rules underlying knowledge graphs (KGs), has become a fast-growing research direction. It has been proven to significantly benefit the usage of KGs in many AI applications, such as question answering and recommendation systems, etc. According to the graph types, the existing KGR models can be roughly divided into three categories, \textit{i.e.,} static models, temporal models, and multi-modal models. The early works in this domain mainly focus on static KGR and tend to directly apply general knowledge graph embedding models to the reasoning task. However, these models are not suitable for more complex but practical tasks, such as inductive static KGR, temporal KGR, and multi-modal KGR. To this end, multiple works have been developed recently, but no survey papers and open-source repositories comprehensively summarize and discuss models in this important direction. To fill the gap, we conduct a survey for knowledge graph reasoning tracing from static to temporal and then to multi-modal KGs. Concretely, the preliminaries, summaries of KGR models, and typical datasets are introduced and discussed consequently. Moreover, we discuss the challenges and potential opportunities. The corresponding open-source repository is shared on GitHub: https://github.com/LIANGKE23/Awesome-Knowledge-Graph-Reasoning.
translated by 谷歌翻译